Chapter 4 Visualization

In this section we’ll explore visualization methods in R. Visualization has been a key element of R since its inception, since visualization is central to the exploratory philosophy of the language. The base plot system generally does a good job in coming up with the most likely graphical output based on the data you provide.

plot(penguins$body_mass_g, penguins$flipper_length_mm)
Flipper length by species

Figure 4.1: Flipper length by species

plot(penguins$species, penguins$flipper_length_mm)
Flipper length by species

Figure 4.2: Flipper length by species

4.1 ggplot2

We’ll mostly focus however on gpplot2, based on the Grammar of Graphics because it provides considerable control over your graphics while remaining fairly easily readable, as long as you buy into its grammar.

ggplot2 looks at three aspects of a graph:

  • data : where are the data coming from?
  • geometry : what type of graph are we creating?
  • aesthetics : what choices can we make about symbology and how do we connect symbology to data?

See https://rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf

The ggplot2 system provides plots of single and multiple variables, using various coordinate systems (including geographic).

4.2 Plotting one variable

  • continuous
    • histograms
    • density plots
    • dot plots
  • discrete
    • bar
library(iGIScData)
library(tidyverse)
summary(XSptsNDVI)
##     DistNtoS       elevation     vegetation       
##  Min.   :  0.0   Min.   :1510   Length:29         
##  1st Qu.: 37.0   1st Qu.:1510   Class :character  
##  Median :175.0   Median :1511   Mode  :character  
##  Mean   :164.7   Mean   :1511                     
##  3rd Qu.:275.5   3rd Qu.:1511                     
##  Max.   :298.8   Max.   :1511                     
##    geometry          NDVIgrowing     NDVIsenescent   
##  Length:29          Min.   :0.3255   Min.   :0.1402  
##  Class :character   1st Qu.:0.5052   1st Qu.:0.2418  
##  Mode  :character   Median :0.6169   Median :0.2817  
##                     Mean   :0.5901   Mean   :0.3662  
##                     3rd Qu.:0.6768   3rd Qu.:0.5407  
##                     Max.   :0.7683   Max.   :0.7578
ggplot(XSptsNDVI, aes(vegetation)) + 
  geom_bar()

4.2.1 Histogram

First, to prepare the data, we need to use a pivot_longer on XSptsNDVI:

XSptsPheno <- XSptsNDVI %>%
  filter(vegetation != "pine") %>%
  pivot_longer(cols = starts_with("NDVI"), names_to = "phenology", values_to = "NDVI") %>%
  mutate(phenology = str_sub(phenology, 5, str_length(phenology)))
XSptsPheno <- read_csv("data/XSptsPheno.csv")
## Parsed with column specification:
## cols(
##   DistNtoS = col_double(),
##   elevation = col_double(),
##   vegetation = col_character(),
##   geometry = col_character(),
##   phenology = col_character(),
##   NDVI = col_double()
## )
XSptsPheno %>%
  ggplot(aes(NDVI)) + 
  geom_histogram(binwidth=0.05)
Distribution of NDVI, Knuthson Meadow

Figure 4.3: Distribution of NDVI, Knuthson Meadow

Normal histogram: easier to visualize the distribution, see modes

sierraData %>%
    ggplot(aes(TEMPERATURE)) +
  geom_histogram(fill="dark green")
## `stat_bin()` using `bins = 30`. Pick better value with
## `binwidth`.
Distribution of Average Monthly Temperatures, Sierra Nevada

Figure 4.4: Distribution of Average Monthly Temperatures, Sierra Nevada

Cumulative histogram with proportions: easier to see percentiles, median

n <- length(sierraData$TEMPERATURE)
sierraData %>%
  ggplot(aes(TEMPERATURE)) +
  geom_histogram(aes(y=cumsum(..count..)/n), fill="dark goldenrod")
## `stat_bin()` using `bins = 30`. Pick better value with
## `binwidth`.
Cumulative Distribution of Average Monthly Temperatures, Sierra Nevada

Figure 4.5: Cumulative Distribution of Average Monthly Temperatures, Sierra Nevada

4.2.2 Density Plot

Density represents how much out of the total. The total area (sum of widths of bins times densities of that bin) adds up to 1.

XSptsPheno %>% 
  ggplot(aes(NDVI)) + 
  geom_density()
Density plot of NDVI, Knuthson Meadow

Figure 4.6: Density plot of NDVI, Knuthson Meadow

Note that NDVI values are <1 so bins are very small numbers, so in this case densities can be >1.

Using alpha and mapping phenology as fill color. This illustrates two useful ggplot methods:

  • “mapping” a variable (phenology) to an aesthetic property (fill color of the density polygon)
  • setting a a property (alpha = 0.2) to all polygons of the density plot. The alpha channel of colors defines its opacity, from invisible (0) to opaque (1) so is commonly used to set as its reverse, transparency.
XSptsPheno %>%
  ggplot(aes(NDVI, fill=phenology)) +
  geom_density(alpha=0.2)

tidy_eucoak %>%
  ggplot(aes(log(runoff_L),fill=tree)) +
  geom_density(alpha=0.2)
Runoff under Eucalyptus and Oak in Bay Area sites

Figure 4.7: Runoff under Eucalyptus and Oak in Bay Area sites

4.2.3 boxplot

ggplot(data = tidy_eucoak) +
  geom_boxplot(aes(x = site, y = runoff_L))
Runoff under Eucalyptus and Oak, Bay Area Sites

Figure 4.8: Runoff under Eucalyptus and Oak, Bay Area Sites

Get color from tree within aes()

ggplot(data = tidy_eucoak) +
  geom_boxplot(aes(x=site, y=runoff_L, color=tree))
Runoff at Bay Area Sites, colored as Eucalyptus and Oak

Figure 4.9: Runoff at Bay Area Sites, colored as Eucalyptus and Oak

Visualizing soil CO_2_ data with a Tukey box plot

co2 <- soilCO2_97
co2$SITE <- factor(co2$SITE)  # in order to make the numeric field a factor
ggplot(data = co2, mapping = aes(x = SITE, y = `CO2%`)) +
  geom_boxplot()
Visualizing soil CO_2_ data with a Tukey box plot

Figure 4.10: Visualizing soil CO_2_ data with a Tukey box plot

4.3 Plotting two variables

4.3.1 Two continuous variables

We’ve looked at this before – the scatterplot

ggplot(data=sierraFeb) + 
  geom_point(mapping = aes(TEMPERATURE, ELEVATION))
Scatter plot of February temperature vs elevation

Figure 4.11: Scatter plot of February temperature vs elevation

  • The aes (“aesthetics”) function specifies the variables to use as x and y coordinates
  • geom_point creates a scatter plot of those coordinate points

Set color for all (not in aes())

ggplot(data=sierraFeb) + 
  geom_point(aes(TEMPERATURE, ELEVATION), color="blue")

  • color is defined outside of aes, so is applies to all points.
  • mapping is first argument of geom_point, so mapping = is not needed.

4.3.2 Two variables, one discrete

ggplot(tidy_eucoak) +
  geom_bar(aes(site, runoff_L), stat="identity")
Two variables, one discrete

Figure 4.12: Two variables, one discrete

4.4 Color systems

You can find a lot about color systems. See these sources:

http://sape.inf.usi.ch/quick-reference/ggplot2/colour http://applied-r.com/rcolorbrewer-palettes/

4.4.1 Color from variable, in aesthetics

In this graph, color is defined inside aes, so is based on COUNTY

ggplot(data=sierraFeb) + 
  geom_point(aes(TEMPERATURE, ELEVATION, color=COUNTY))
Color set within aes()

Figure 4.13: Color set within aes()

Plotting lines using the same x,y in aesthetics

sierraFeb %>%
  ggplot(aes(TEMPERATURE,ELEVATION)) +
  geom_point(color="blue") +
  geom_line(color="red")
Using aesthetics settings for both points and lines

Figure 4.14: Using aesthetics settings for both points and lines

Note the use of pipe to start with the data then apply ggplot.

River map & profile

x <- c(1000, 1100, 1300, 1500, 1600, 1800, 1900)
y <- c(500, 700, 800, 1000, 1200, 1300, 1500)
z <- c(0, 1, 2, 5, 25, 75, 150)
d <- rep(NA, length(x))
longd <- rep(NA, length(x))
s <- rep(NA, length(x))
for(i in 1:length(x)){
  if(i==1){longd[i] <- 0; d[i] <-0}
  else{
    d[i] <- sqrt((x[i]-x[i-1])^2 + (y[i]-y[i-1])^2)
    longd[i] <- longd[i-1] + d[i]
    s[i-1] <- (z[i]-z[i-1])/d[i]}}
longprofile <- bind_cols(x=x,y=y,z=z,d=d,longd=longd,s=s)
ggplot(longprofile, aes(x,y)) +
  geom_line(mapping=aes(col=s), size=1.2) + 
  geom_point(mapping=aes(col=s, size=z)) +
  coord_fixed(ratio=1) + scale_color_gradient(low="green", high="red") +
  ggtitle("Simulated river path, elevations and slopes")
Longitudinal Profiles

Figure 4.15: Longitudinal Profiles

ggplot(longprofile, aes(longd,z)) + geom_line(aes(col=s), size=1.5) + geom_point()  +
  scale_color_gradient(low="green", high="red") +
  ggtitle("Elevation over longitudinal distance upstream")
Longitudinal Profiles

Figure 4.16: Longitudinal Profiles

ggplot(longprofile, aes(longd,s)) + geom_point(aes(col=s), size=3) +
  scale_color_gradient(low="green", high="red") +
  ggtitle("Slope over longitudinal distance upstream")
## Warning: Removed 1 rows containing missing values (geom_point).
Longitudinal Profiles

Figure 4.17: Longitudinal Profiles

#summary(lm(s~longd, data=longprofile))

4.4.2 Trend line

sierraFeb %>%
  ggplot(aes(TEMPERATURE,ELEVATION)) +
  geom_point(color="blue") +
  geom_smooth(color="red", method="lm")
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 20 rows containing non-finite values
## (stat_smooth).
## Warning: Removed 20 rows containing missing values (geom_point).
Trend line using geom_smooth with a linear model

Figure 4.18: Trend line using geom_smooth with a linear model

4.4.3 General symbology

A useful vignette accessed by vignette("ggplot2-specs") lets you see aesthetic specifications for symbols, including:

  • Color & fill
  • Lines
    • line type, size, ends
  • Polygon
    • border color, linetype, size
    • fill
  • Points
    • shape
    • size
    • color & fill
    • stroke
  • Text
    • font face & size
    • justification

4.4.3.1 Categorical symbology

One example of a “Big Data” resource is EPA’s Toxic Release Inventory that tracks releases from a wide array of sources, from oil refineries on down. One way of dealing with big data in terms of exploring meaning is to use symbology to try to make sense of it.

csvPath <- system.file("extdata","TRI_2017_CA.csv", package="iGIScData")
TRI <- read_csv(csvPath) %>%
  filter(`5.1_FUGITIVE_AIR` > 100 & `5.2_STACK_AIR` > 100)
## Warning: Missing column names filled in: 'X110' [110]
## Warning: 3807 parsing failures.
## row col    expected      actual                                                                               file
##   1  -- 110 columns 109 columns 'C:/Users/900008452/Documents/R/win-library/4.0/iGIScData/extdata/TRI_2017_CA.csv'
##   2  -- 110 columns 109 columns 'C:/Users/900008452/Documents/R/win-library/4.0/iGIScData/extdata/TRI_2017_CA.csv'
##   3  -- 110 columns 109 columns 'C:/Users/900008452/Documents/R/win-library/4.0/iGIScData/extdata/TRI_2017_CA.csv'
##   4  -- 110 columns 109 columns 'C:/Users/900008452/Documents/R/win-library/4.0/iGIScData/extdata/TRI_2017_CA.csv'
##   5  -- 110 columns 109 columns 'C:/Users/900008452/Documents/R/win-library/4.0/iGIScData/extdata/TRI_2017_CA.csv'
## ... ... ........... ........... ..................................................................................
## See problems(...) for more details.
ggplot(data = TRI, aes(log(`5.2_STACK_AIR`), log(`5.1_FUGITIVE_AIR`), 
                       color = INDUSTRY_SECTOR)) +
       geom_point()
EPA Toxic Release Inventory, as a big data set needing symbology clarification

Figure 4.19: EPA Toxic Release Inventory, as a big data set needing symbology clarification

4.4.3.2 Graphs from grouped data

XSptsPheno %>%
  ggplot() +
  geom_point(aes(elevation, NDVI, shape=vegetation, 
                 color = phenology), size = 3) +
  geom_smooth(aes(elevation, NDVI, 
                 color = phenology), method="lm") 
## `geom_smooth()` using formula 'y ~ x'
NDVI symbolized by vegetation in two seasons

Figure 4.20: NDVI symbolized by vegetation in two seasons

ggplot(data = tidy_eucoak) +
  geom_point(mapping = aes(x = rain_mm, y = runoff_L, color = tree)) +
  geom_smooth(mapping = aes(x = rain_mm, y= runoff_L, color = tree), 
              method = "lm") +
  scale_color_manual(values = c("seagreen4", "orange3"))
## `geom_smooth()` using formula 'y ~ x'
Eucalyptus and Oak: rainfall and runoff

Figure 4.21: Eucalyptus and Oak: rainfall and runoff

4.4.3.3 Faceted graphs

This is another option to displaying groups of data, with parallel graphs

ggplot(data = tidy_eucoak) +
  geom_point(aes(x=rain_mm,y=runoff_L)) +
  geom_smooth(aes(x=rain_mm,y=runoff_L), method="lm") +
  facet_grid(tree ~ .)
## `geom_smooth()` using formula 'y ~ x'
Faceted graph alternative

Figure 4.22: Faceted graph alternative

4.5 Titles and subtitles

ggplot(data = tidy_eucoak) +
  geom_point(aes(x=rain_mm,y=runoff_L, color=tree)) +
  geom_smooth(aes(x=rain_mm,y=runoff_L, color=tree), method="lm") +
  scale_color_manual(values=c("seagreen4","orange3")) +
  labs(title="rainfall ~ runoff", 
       subtitle="eucalyptus & oak sites, 2016")
## `geom_smooth()` using formula 'y ~ x'
Titles added

Figure 4.23: Titles added

4.6 Pairs Plot

sierraFeb %>%
  select(LATITUDE, ELEVATION, TEMPERATURE, PRECIPITATION) %>%
  pairs()
Pairs plot for Sierra Nevada stations variables

Figure 4.24: Pairs plot for Sierra Nevada stations variables

There are many versions of pairs plots. Here’s one from GGally (need to install.packages("GGally") first):

library(GGally)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
sierraFeb %>%
   select(LATITUDE, ELEVATION, TEMPERATURE, PRECIPITATION) %>%
   ggpairs()
## ```

4.7 plot: [1,1] [=>———————————–] 6% est: 0s

4.8 plot: [1,2] [====>——————————–] 12% est: 0s

4.9 plot: [1,3] [======>——————————] 19% est: 0s

4.10 plot: [1,4] [========>—————————-] 25% est: 0s

4.11 plot: [2,1] [===========>————————-] 31% est: 0s

4.12 plot: [2,2] [=============>———————–] 38% est: 0s

4.13 plot: [2,3] [===============>———————] 44% est: 0s

4.14 plot: [2,4] [=================>——————-] 50% est: 0s

4.15 plot: [3,1] [====================>—————-] 56% est: 0s

4.16 plot: [3,2] [======================>————–] 62% est: 0s

4.17 plot: [3,3] [========================>————] 69% est: 0s

4.18 plot: [3,4] [===========================>———] 75% est: 0s

4.19 plot: [4,1] [=============================>——-] 81% est: 0s

4.20 plot: [4,2] [===============================>—–] 88% est: 0s

4.21 plot: [4,3] [==================================>–] 94% est: 0s

4.22 plot: [4,4] [=====================================]100% est: 0s


<div class="figure">
<img src="EnvDataSci_files/figure-html/unnamed-chunk-101-1.png" alt="Pairs plot in GGally package" width="672" />
<p class="caption">(\#fig:unnamed-chunk-101)Pairs plot in GGally package</p>
</div>

## Exercises

1. Create a bar graph of the counts of the species in the penguins data frame.  What can you say about what it shows?

2. Use bind_cols in dplyr to create a tibble from built-in vectors state.abb and state.region, then use ggplot with geom_bar to create a bar graph of the four regions.

3. Convert the built-in time series `treering` into a tibble `tr`using the `tibble()` functions with the single variable assigned as `treering = treering`, then create a histogram, using that tibble and variable for the `data` and `x` settings needed.



4. Create and display a new tibble `st` using `bind_cols` with `Name=state.name`, `Abb=state.abb`, `Region=state.region`, and a tibble created from `state.x77` with `as_tibble`.  




5. From `st`, create a density plot from the variable `Frost` (number of days with frost for that state).  Approximately what is the modal value?



6. From `st` create a a boxplot of `Area` by `Region`.  Which region has the highest and which has the lowest median Area?  Do the same for `Frost`.



7. From st, compare murder rate (y=Murder) to Frost (x) in a scatter plot, colored by Region.



8. Add a trend line (smooth) with method="lm" to your scatterplot, not colored by Region (but keep the points colored by Region). What can you say about what this graph is showing you?

9. Add a title to your graph.

10. Change your scatterplot to place labels using the Abb variable (still colored by Region) using `geom_label(aes(label=Abb, col=Region))`. Any observations about outliers?




<!--chapter:end:03-visualization.Rmd-->

# Spatial R

We'll explore the basics of simple features (sf) for building spatial datasets, then some common mapping methods, probably:

- ggplot2
- tmap

## Spatial Data 

To work with spatial data requires extending R to deal with it using packages.  Many have been developed, but the field is starting to mature using international open GIS standards.

*`sp`*  (until recently, the dominant library of spatial tools)

- Includes functions for working with spatial data
- Includes `spplot` to create maps
- Also needs `rgdal` package for `readOGR` – reads spatial data frames.  

*`sf`* (Simple Features)

- ISO 19125 standard for GIS geometries
- Also has functions for working with spatial data, but clearer to use.
- Doesn't need many additional packages, though you may still need `rgdal` installed for some tools you want to use.
- Replacing `sp` and `spplot` though you'll still find them in code. We'll give it a try...
- Works with ggplot2 and tmap for nice looking maps.

Cheat sheet: https://github.com/rstudio/cheatsheets/raw/master/sf.pdf

#### simple feature geometry sfg and simple feature column sfc



### Examples of simple geometry building in sf 

sf functions have the pattern st_* 

st means "space and time"

See Geocomputation with R at https://geocompr.robinlovelace.net/ or  https://r-spatial.github.io/sf/
    for more details, but here's an example of manual feature creation of sf geometries (sfg):


```r
library(tidyverse)
library(sf)
library(sf)
eyes <- st_multipoint(rbind(c(1,5), c(3,5)))
nose <- st_point(c(2,4))
mouth <- st_linestring(rbind(c(1,3),c(3, 3)))
border <- st_polygon(list(rbind(c(0,5), c(1,2), c(2,1), c(3,2), 
                              c(4,5), c(3,7), c(1,7), c(0,5))))
face <- st_sfc(eyes, nose, mouth, border)  # sfc = sf column 
plot(face)
Building simple geometries in sf

Figure 4.25: Building simple geometries in sf

The face was a simple feature column (sfc) built from the list of sfgs. An sfc just has the one column, so is not quite like a shapefile.

  • But it can have a coordinate referencing system CRS, and so can be mapped.
  • Kind of like a shapefile with no other attributes than shape

4.22.1 Building a mappable sfc from scratch

CA_matrix <- rbind(c(-124,42),c(-120,42),c(-120,39),c(-114.5,35),
  c(-114.1,34.3),c(-114.6,32.7),c(-117,32.5),c(-118.5,34),c(-120.5,34.5),
  c(-122,36.5),c(-121.8,36.8),c(-122,37),c(-122.4,37.3),c(-122.5,37.8),
  c(-123,38),c(-123.7,39),c(-124,40),c(-124.4,40.5),c(-124,41),c(-124,42))
NV_matrix <- rbind(c(-120,42),c(-114,42),c(-114,36),c(-114.5,36),
  c(-114.5,35),c(-120,39),c(-120,42))
CA_list <- list(CA_matrix);       NV_list <- list(NV_matrix)
CA_poly <- st_polygon(CA_list);   NV_poly <- st_polygon(NV_list)
sfc_2states <- st_sfc(CA_poly,NV_poly,crs=4326)  # crs=4326 specifies GCS
st_geometry_type(sfc_2states)
## [1] POLYGON POLYGON
## 18 Levels: GEOMETRY POINT LINESTRING POLYGON ... TRIANGLE
library(tidyverse)
ggplot() + geom_sf(data = sfc_2states)
A simple map built from scratch with hard-coded data as simple feature columns

Figure 4.26: A simple map built from scratch with hard-coded data as simple feature columns

sf class

Is like a shapefile: has attributes to which geometry is added, and can be used like a data frame.

attributes <- bind_rows(c(abb="CA", area=423970, pop=39.56e6),
                        c(abb="NV", area=286382, pop=3.03e6))
twostates <- st_sf(attributes, geometry = sfc_2states)
ggplot(twostates) + geom_sf() + geom_sf_text(aes(label = abb))
## Warning in st_point_on_surface.sfc(sf::st_zm(x)):
## st_point_on_surface may not give correct results for longitude/
## latitude data
Using an sf class to build a map, displaying an attribute

Figure 4.27: Using an sf class to build a map, displaying an attribute

4.22.2 Creating features from shapefiles or tables

sf’s st_read reads shapefiles

  • shapefile is an open GIS format for points, polylines, polygons

You would normally have shapefiles (and all the files that go with them – .shx, etc.) stored on your computer, but we’ll access one from the iGIScData external data folder:

library(iGIScData)
library(sf)
shpPath <- system.file("extdata","CA_counties.shp", package="iGIScData")
CA_counties <- st_read(shpPath)
## Reading layer `CA_counties' from data source `C:\Users\900008452\Documents\R\win-library\4.0\iGIScData\extdata\CA_counties.shp' using driver `ESRI Shapefile'
## Simple feature collection with 58 features and 60 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -124.4152 ymin: 32.53427 xmax: -114.1312 ymax: 42.00952
## geographic CRS: WGS 84
plot(CA_counties)
## Warning: plotting the first 9 out of 60 attributes; use max.plot =
## 60 to plot all

st_as_sf converts data frames

  • using coordinates read from x and y variables, with crs set to coordinate system (4326 for GCS)
sierraFebpts <- st_as_sf(sierraFeb, coords = c("LONGITUDE", "LATITUDE"), crs=4326)
plot(sierraFebpts)

library(tidyverse)
library(sf)
library(iGIScData)
censusCentroids <- st_centroid(BayAreaTracts)
TRI_sp <- st_as_sf(TRI_2017_CA, coords = c("LONGITUDE", "LATITUDE"), 
        crs=4326) # simple way to specify coordinate reference
bnd <- st_bbox(censusCentroids)
ggplot() +
  geom_sf(data = BayAreaCounties, aes(fill = NAME)) +
  geom_sf(data = censusCentroids) +
  geom_sf(data = CAfreeways, color = "grey") +
  geom_sf(data = TRI_sp, color = "yellow") +
  coord_sf(xlim = c(bnd[1], bnd[3]), ylim = c(bnd[2], bnd[4])) +
  labs(title="Bay Area Counties, Freeways and Census Tract Centroids")
ggplot map of Bay Area TRI sites, census centroids, freeways

Figure 4.28: ggplot map of Bay Area TRI sites, census centroids, freeways

4.22.3 Coordinate Referencing System

Say you have data you need to make spatial with a spatial reference

sierra <- read_csv("sierraClimate.csv")

EPSG or CRS codes are an easy way to provide coordinate referencing.

Two ways of doing the same thing.

  1. Spell it out:
GCS <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
wsta = st_as_sf(sierra, coords = c("LONGITUDE","LATITUDE"), crs=GCS)
  1. Google to find the code you need and assign it to the crs parameter:

wsta <- st_as_sf(sierra, coords = c("LONGITUDE","LATITUDE"), crs=4326)

4.22.3.1 Removing Geometry

There are many instances where you want to remove geometry from a sf data frame

  • Some R functions run into problems with geometry and produce confusing error messages, like “non-numeric argument”

  • You’re wanting to work with an sf data frame in a non-spatial way

One way to remove geometry:

myNonSFdf <- mySFdf %>% st_set_geometry(NULL)

4.22.4 Spatial join st_join

A spatial join with st_join joins data from census where TRI points occur

TRI_sp <- st_as_sf(TRI_2017_CA, coords = c("LONGITUDE", "LATITUDE"), crs=4326) %>%
  st_join(BayAreaTracts) %>%
  filter(CNTY_FIPS %in% c("013", "095"))

4.22.5 Plotting maps in the base plot system

There are various programs for creating maps from spatial data, and we’ll look at a few after we’ve looked at rasters. As usual, the base plot system often does something useful when you give it data.

plot(BayAreaCounties)
## Warning: plotting the first 9 out of 60 attributes; use max.plot =
## 60 to plot all

And with just one variable:

plot(BayAreaCounties["POP_SQMI"])

There’s a lot more we could do with the base plot system, but we’ll mostly focus on some better options in ggplot2 and tmap.

4.23 Raster GIS in R

Simple Features are feature-based, so based on the name I guess it’s not surprising that sf doesn’t have support for rasters. But we can use the raster package for that.

A bit of raster reading and map algebra with Marble Mountains elevation data

library(raster)
rasPath <- system.file("extdata","elev.tif", package="iGIScData")
elev <- raster(rasPath)
slope <- terrain(elev, opt="slope")
aspect <- terrain(elev, opt="aspect")
slopeclasses <-matrix(c(0,0.2,1, 0.2,0.4,2, 0.4,0.6,3,
                        0.6,0.8,4, 0.8,1,5), ncol=3, byrow=TRUE)
slopeclass <- reclassify(slope, rcl = slopeclasses)

plot(elev)

plot(slope)

plot(slopeclass)

plot(aspect)

4.23.1 Raster from scratch

new_raster2 <- raster(nrows = 6, ncols = 6, res = 0.5,
                      xmn = -1.5, xmx = 1.5, ymn = -1.5, ymx = 1.5,
                      vals = 1:36)
plot(new_raster2)

4.24 ggplot2 for maps

The Grammar of Graphics is the gg of ggplot.

  • Key concept is separating aesthetics from data
  • Aesthetics can come from variables (using aes()setting) or be constant for the graph

Mapping tools that follow this lead

  • ggplot, as we have seen, and it continues to be enhanced
  • tmap (Thematic Maps) https://github.com/mtennekes/tmap Tennekes, M., 2018, tmap: Thematic Maps in R, Journal of Statistical Software 84(6), 1-39
ggplot(CA_counties) + geom_sf()

Try ?geom_sf and you’ll find that its first parameters is mapping with aes() by default. The data property is inherited from the ggplot call, but commonly you’ll want to specify data=something in your geom_sf call.

Another simple ggplot, with labels

ggplot(CA_counties) + geom_sf() +
  geom_sf_text(aes(label = NAME), size = 1.5)
## Warning in st_point_on_surface.sfc(sf::st_zm(x)):
## st_point_on_surface may not give correct results for longitude/
## latitude data

and now with fill color

ggplot(CA_counties) + geom_sf(aes(fill = MED_AGE)) +
  geom_sf_text(aes(label = NAME), col="white", size=1.5)
## Warning in st_point_on_surface.sfc(sf::st_zm(x)):
## st_point_on_surface may not give correct results for longitude/
## latitude data

Repositioned legend, no “x” or “y” labels

ggplot(CA_counties) + geom_sf(aes(fill=MED_AGE)) +
  geom_sf_text(aes(label = NAME), col="white", size=1.5) +
  theme(legend.position = c(0.8, 0.8)) +
  labs(x="",y="")

Map in ggplot2, zoomed into two counties:

library(tidyverse); library(sf); library(iGIScData)
census <- BayAreaTracts %>%
   filter(CNTY_FIPS %in% c("013", "095"))
TRI <- TRI_2017_CA %>%
  st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs=4326) %>%
  st_join(census) %>%
  filter(CNTY_FIPS %in% c("013", "095"),
         (`5.1_FUGITIVE_AIR` + `5.2_STACK_AIR`) > 0)
## although coordinates are longitude/latitude, st_intersects assumes that they are planar
## although coordinates are longitude/latitude, st_intersects assumes that they are planar
bnd = st_bbox(census)
ggplot() +
  geom_sf(data = BayAreaCounties, aes(fill = NAME)) +
  geom_sf(data = census, color="grey40", fill = NA) +
  geom_sf(data = TRI) +
  coord_sf(xlim = c(bnd[1], bnd[3]), ylim = c(bnd[2], bnd[4])) +
  labs(title="Census Tracts and TRI air-release sites") +
  theme(legend.position = "none")

4.24.1 Rasters in ggplot2

Raster display in ggplot2 is currently a little awkward, as are rasters in general in the feature-dominated GIS world.

We can use a trick: converting rasters to a grid of points:

library(tidyverse)
library(sf)
library(raster)
rasPath <- system.file("extdata","elev.tif", package="iGIScData")
elev <- raster(rasPath)
shpPath <- system.file("extdata","trails.shp", package="iGIScData")
trails <- st_read(shpPath)
## Reading layer `trails' from data source `C:\Users\900008452\Documents\R\win-library\4.0\iGIScData\extdata\trails.shp' using driver `ESRI Shapefile'
## Simple feature collection with 32 features and 8 fields
## geometry type:  LINESTRING
## dimension:      XY
## bbox:           xmin: 481903.8 ymin: 4599196 xmax: 486901.9 ymax: 4603200
## projected CRS:  NAD83 / UTM zone 10N
elevpts = as.data.frame(rasterToPoints(elev))
ggplot() +
  geom_raster(data = elevpts, aes(x = x, y = y, fill = elev)) +
  geom_sf(data = trails)

4.25 tmap

Basic building block is tm_shape(data) followed by various layer elements such as tm_fill() shape can be features or raster See Geocomputation with R Chapter 8 “Making Maps with R” for more information. https://geocompr.robinlovelace.net/adv-map.html

library(spData)
## 
## Attaching package: 'spData'
## The following object is masked _by_ '.GlobalEnv':
## 
##     elev
library(tmap)
tm_shape(world) + tm_fill() + tm_borders()

Color by variable

library(sf)
library(tmap)
tm_shape(BayAreaTracts) + tm_fill(col = "MED_AGE")

tmap of sierraFeb with hillshade and point symbols

library(tmap)
library(sf)
library(raster)
tmap_mode("plot")
## tmap mode set to plotting
tmap_options(max.categories = 8)
#sierraFeb <- st_read("data/sierraFeb.csv")
sierra <- st_as_sf(sierraFeb, coords = c("LONGITUDE", "LATITUDE"), crs = 4326)
#hillsh <- raster("data/ca_hillsh_WGS84.tif")
bounds <- st_bbox(sierra)
tm_shape(CAhillsh,bbox=bounds)+
  tm_raster(palette="-Greys",legend.show=FALSE,n=10) + tm_shape(sierra) + tm_symbols(col="TEMPERATURE",
     palette=c("blue","red"), style="cont",n=8) +
  tm_legend() + 
  tm_layout(legend.position=c("RIGHT","TOP"))
## stars object downsampled to 1092 by 915 cells. See tm_shape manual (argument raster.downsample)

Note: “-Greys” needed to avoid negative image, since “Greys” go from light to dark, and to match reflectance as with b&w photography, they need to go from dark to light.

4.25.1 Interactive Maps

The word “static” in “static maps” isn’t something you would have heard in a cartography class 30 years ago, since essentially all maps then were static. Very important in designing maps is considering your audience, and one characteristic of the audience of those maps of yore were that they were printed and thus fixed on paper. A lot of cartographic design relates to that property:

  • Figure-to-ground relationships assume “ground” is a white piece of paper (or possibly a standard white background in a pdf), so good cartographic color schemes tend to range from light for low values to dark for high values.
  • Scale is fixed and there are no “tools” for changing scale, so a lot of attention must be paid to providing scale information.
  • Similarly, without the ability to see the map at different scales, inset maps are often needed to provide context.

Interactive maps change the game in having tools for changing scale, and always being “printed” on a computer or device where the color of the background isn’t necessarily white. We are increasingly used to using interactive maps on our phones or other devices, and often get frustrated not being able to zoom into a static map.

A widely used interactive mapping system is Leaflet, but we’re going to use tmap to access Leaflet behind the scenes and allow us to create maps with one set of commands. The key parameter needed is tmap_mode which must be set to “view” to create an interactive map.

tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(BayAreaTracts) + tm_fill(col = "MED_AGE", alpha = 0.5)
library(tmap)
library(sf)
tmap_mode("view")
## tmap mode set to interactive viewing
tmap_options(max.categories = 8)
sierra <- st_as_sf(sierraFeb, coords = c("LONGITUDE", "LATITUDE"), crs = 4326)
bounds <- st_bbox(sierra)
tm_basemap(leaflet::providers$Esri.NatGeoWorldMap) +
  tm_shape(sierra) + tm_symbols(col="TEMPERATURE",
  palette=c("blue","red"), style="cont",n=8,size=0.2) +
  tm_legend() + 
  tm_layout(legend.position=c("RIGHT","TOP"))
## legend.postion is used for plot mode. Use view.legend.position in tm_view to set the legend position in view mode.

4.25.1.1 Leaflet

Now that we’ve seen an app that used it, let’s look briefly at Leaflet itself, and we’ll see that even the Leaflet package in R actually uses JavaScript…

Leaflet is designed as “An open-source JavaScript library for mobile-friendly interactive maps” https://leafletjs.com “The R package leaflet is an interface to the JavaScript library Leaflet to create interactive web maps. It was developed on top of the htmlwidgets framework, which means the maps can be rendered in RMarkdown (v2) documents (which is why you can see it in this document), Shiny apps, and RStudio IDE / the R console.”

https://blog.rstudio.com/2015/06/24/leaflet-interactive-web-maps-with-r/

https://github.com/rstudio/cheatsheets/blob/master/leaflet.pdf

library(leaflet)
m <- leaflet() %>%
  addTiles() %>%  # default OpenStreetMap tiles
  addMarkers(lng=174.768, lat=-36.852,
             popup="The birthplace of R")
m 

4.26 Exercises

  1. Using the method of building simple sf geometries, build a simple 1x1 square object and plot it. Remember that you have to close the polygon, so the first vertex is the same as the last (of 5) vertices.

  2. Build a map in ggplot of Colorado, Wyoming, and Utah with these boundary vertices in GCS. As with the square, remember to close each figure, and assign the crs to what is needed for GCS: 4326.

  • Colorado: \((-109,41),(-102,41),(-102,37),(-109,37)\)
  • Wyoming: \((-111,45),(-104,45),(-104,41),(-111,41)\)
  • Utah: \((-114,42),(-111,42),(-111,41),(-109,41),(-109,37),(-114,37)\)
  • Arizona: \((-114,37),(-109,37),(-109,31.3),(-111,31.3),(-114.8,32.5),(-114.6,32.7),(-114.1,34.3),(-114.5,35),(-114.5,36),(-114,36)\)
  • New Mexico: \((-109,37),(-103,37),(-103,32),(-106.6,32),(-106.5,31.8),(-108.2,31.8),(-108.2,31.3),(-109,31.3)\)
  1. Add in the code for CA and NV and create kind of a western US map…

  2. Create an sf class from the five states adding the fields name, abb, area_sqkm, and population, and create a map labeling with the name.

  • Colorado, CO, 269837, 5758736
  • Wyoming, WY, 253600, 578759
  • Utah, UT, 84899, 3205958
  • Arizona, AZ, 295234, 7278717
  • New Mexico, NM, 314917, 2096829
  • California, CA, 423970, 39368078
  • Nevada, NV, 286382, 3080156
  1. Create a tibble for the highest peaks in the 7 states, with the following names, elevations in m, longitude and latitude, and add them to that map:
  • Wheeler Peak, 4011, -105.4, 36.5
  • Mt. Whitney, 4421, -118.2, 36.5
  • Boundary Peak, 4007, -118.35, 37.9
  • Kings Peak, 4120, -110.3, 40.8
  • Gannett Peak, 4209, -109, 43.2
  • Mt. Elbert, 4401, -106.4, 39.1
  • Humphreys Peak, 3852, -111.7, 35.4

Note: the easiest way to do this is with the tribble function, starting with:

peaks <- tribble(
  ~peak, ~elev, ~longitude, ~latitude,
  "Wheeler Peak", 4011, -105.4, 36.5,
  1. Use a spatial join to add the points to the states to provide a new attribute maximum elevation, and display that using geom_sf_text() with the state polygons.

  2. Even though the result isn’t terribly useful, send that spatially joined data to the base plot system to see what you get.

  3. From the CA_counties and CAfreeways feature data in iGIScData, make a simple map in ggplot, with freeways colored red.

  4. After adding the raster library, create a raster from the built-in volcano matrix of elevations from Auckland’s Maunga Whau Volcano, and use plot() to display it. We’d do more with that dataset but we don’t know what the cell size is.

  5. Use tmap to create a simple map from the SW_States (polygons) and peaksp (points) data we created earlier. Hints: you’ll want to use tm_text with text set to “peak” to label the points, along with the parameter auto.placement=TRUE.

  6. Change the map to the view mode, but don’t use the state borders since the basemap will have them. Just before adding shapes, set the basemap to leaflet::providers$Esri.NatGeoWorldMap, then continue to the peaks after the + to see the peaks on a National Geographic basemap.